Managing Retries

Introduction#

Digital applications receive millions of requests every second, but they might not complete their request-response cycle due to many possible failures. In this case, the client can generally receive the following HTTP status codes:

HTTP Status Code

Reason

408

Request timeout

429

Too many requests

500

Internal server error

502

Gateway or proxy server receives unexpected response

503

Service temporarily unavailable

504

Gateway or proxy server timeout

Although each status code has its own story, we organized them into the following four categories. In the first two categories, the client doesn’t get a response from the server and needs to retry. In the last two categories, we receive the response with an error and need to retry after some modifications:

  • Request lost: The first scenario is when a request is initiated by the client but never received by the target service. It’s as if the request never occurred.

Client
Clie...
Server
Serv...
Request
Request
Error 408: Request timeout
Error 408: Request timeout
Viewer does not support full SVG 1.1
Client initiates request which never reaches the target server

In the scenario above, the client doesn’t get a response. For example, requests can get lost due to congestion at some router while en route to the service.

  • Response lost: The second scenario is where the client initiates the request, which is processed by the target service. After this, the service sends the response to the requested client, which never arrives.

Client
Clie...
Server
Serv...
Request
Request
Response
Response
Error 408: Request timeout
Error 408: Request timeout
Viewer does not support full SVG 1.1
A response forwarded by the server but never reaches the requested client
  • Service unresponsive: The third scenario is when the target service can’t process the request appropriately, possibly because of an error on the downstream service. The error status and the appropriate message are sent back to the client.

Client
Clie...
Server
Serv...
Request
Request
The server failed to process due to an error
The server failed to proc...
5xx error response
5xx error response
Viewer does not support full SVG 1.1
The request reaches the target service but does not process it

  • Response with error code: The fourth scenario can be where the client receives a response with an error from the target service due to a bad request or parameters. In this case, the client must address this error before retrying.

Client
Clie...
Server
Serv...
Request
Request
Response 400: Bad request
Response 400: Bad request
Viewer does not support full SVG 1.1
Client receives the response from the server with an error

In the first two scenarios discussed above, the clients don’t know which case occurred because no response has been received. In the third scenario, the response to the server error is sent back to the client in case it can't process the request. In contrast, the client application shows an error indicating that the connection to the service has failed. The only option in the first three scenarios that the client has is to retransmit the same request. This process is called a retry. The last scenario requires clients to address the errors in the request before retrying.

The following section will explain the side effects of the retries. Also, we’ll emphasize what issues arise when the client resends the same request without fixing the errors received from the server.

How do retries lead to issues?#

This section primarily highlights the issues that arise because of retries. Although the rate limiter handles frequent requests (new or retry requests) in a specified time, the issues mostly arise before reaching the rate limit threshold. Let's go through the issues below:

  • In the first of the three scenarios mentioned above, requests either reach the server or do not reach it at all. Even if requests reach the server, clients either don’t get a response or receive an error response due to the server. The natural reaction from the client would be to retry, which can lead to unnecessary utilization of computational resources at the service.

  • Another scenario is where numerous clients perform retries simultaneously. As a result, the network might be congested when many requests try to reach the target service concurrently.

  • An important case is when the client sends useless retry requests. These kinds of retries are useless because the response from the server won’t change. For instance, a client sending a bad request will always lead to a failed response. These scenarios require the client to stop sending further retry requests because the response from the server will remain the same. Although it won’t do anything with data, it will waste the bandwidth, overburden the server, and can cause network congestion.

  • Sending an excessive number of retries can hurt the client as well. For free APIs, doing this might cause the client's requests to throttle the machinery, or for paid services, a client might be wasting their API call quota.

The following section will explain how retries work and solve the discussed issues.

How do we manage retries?#

In the section above, we have discussed scenarios explaining when clients need to retry and when too many retries create a mess. This section will discuss solutions to prevent servers from being overburdened and avoid network congestion. We’ll also look at some techniques for discouraging retries where they aren’t needed.

The backoff algorithm #

One solution is adding a jitter in each client's wait time so that the client does not frequently face the same network failure. We can obtain the jitter value using the exponential backoff algorithm. This algorithm uses a multiplication element to reduce the retry process rate. In other words, the interval will increase with each request. Let's take a look at the formula that’s used to reduce the process rate:

time=mctime = m^c

Here, timetime represents the delay between requests, mm indicates the multiplication factor, and cc is the count that shows how many times a specified event (HTTP status code) occurs. This solution works fine, assuming clients are not retrying simultaneously. But what if hundreds of clients are retrying the requests concurrently or the retries are close to each other due to the instances above, but certain requests proceed depending on the server capacity? Excessive retries might overshoot the current available capacity and trigger elastic scaling. A way to efficiently use computational resources is to control the frequency of the retries. This problem is called the thundering herd problem. Let's understand this with the help of the following illustration:

A thundering herd problem where the server receives retries concurrently from many clients (m=2, c=0, 1, 2, 3 ...)
A thundering herd problem where the server receives retries concurrently from many clients (m=2, c=0, 1, 2, 3 ...)

The backoff algorithm with random jitter#

We can address the problem above by adding a random jitter instead of adding the same jitter (calculated using the backoff algorithm) in the waiting time of each client's retry. This random jitter is computed separately and added to the value calculated from the backoff algorithm. By doing this, we can reduce the burden on the server. Let's take a look at the following illustration after adding random jitter:

Eliminating the thundering herd problem by adding random jitter with the backoff algorithm
Eliminating the thundering herd problem by adding random jitter with the backoff algorithm

We illustrated above how the combination of the backoff algorithm and random jitter provides the retry waiting time. In the illustration above, t0 indicates the time when clients make requests for the first time. The t1 specifies the previous retry time of the specified client. For instance, the t1 for Client 1 is t0+x0. Additionally, the clients do not need to retry repeatedly, as shown in the illustration above.

Capping the maximum number of retries in a certain period can be one of the ways to limit the number of retries. The retries from various clients might be processed the first time they reach the server. Another way to do this can be to process a few requests the first time and the rest in the next retry.

Let's discuss this example where the server processes retries from various clients in two cycles. This is given that the retries are happening due to transient errors, which are recoverable temporary errors, such as a momentarily unavailable connection, timeouts, and so on. We’ll also see how the number of retries reduces with this approach.

Created with Fabric.js 3.6.6
The server fails all requests

1 of 3

Created with Fabric.js 3.6.6
The server processes Client 3's requests on its first retry

2 of 3

Created with Fabric.js 3.6.6
The server processes all remaining retry requests

3 of 3

In the slides above, the server first processes the request Client 3 sends on its first retry. On the second retry, the server processes the remaining retried requests. As a result, breaking the synchrony of client requests helps manage the instantaneous load on the server and, on average, helps the clients by reducing retry-induced delays.

Note: Sometimes, the server can also specify the waiting time to the client for the next retry. This information is sent in the Retry-After header to the client. Although it isn’t practical for all discussed HTTP status codes, it is suitable for a few, such as the 429 and 503 status codes.

Point to Ponder

Question

Why are we adding latency through jitter if it slows down the client?

Hide Answer

The purpose of random jitter is to break the synchrony between different clients’ requests or stop a client from sending too many requests too soon (because if a service isn’t working, it might need some time to recover). It adds latency for specific calls, but on average, such a mechanism spreads out requests and provides lower average latency.

Useless retries prevention#

The next issue is useless retries that the HTTP status codes can address.

If a client receives a status code such as 4xx or 5xx  (other than what is mentioned in the table at the start), the client should not retry. Before retrying, the client should either fix any issues that can be fixed in the request or retry after sufficient time. This is to give the service some time to recover from its internal error. This is because these requests won’t change responses, but can overload the target services and cause network congestion.

Let's discuss one example of both client-side errors and server-side errors. We’ll start with a client-side error, where the client requests a particular resource, but the requested resource is no longer available. In this case, the client gets the 410 status code and doesn’t need to retry. It’s because the retry request will get the same response from the server.

Let's take a look at an example of server-side errors where the client sends a request that does not satisfy the extension policy (resource access policy), and the server responds with a 510 status code. This response includes details such as the server not supporting extensions that are asked for in the request. If the client retries the same request, it will get the same response each time until the request is altered according to the details in the 510 response. Some familiar HTTP status codes where retrying is not useful are listed in the table below:

HTTP Status Code (4xx)

Reason

HTTP Status Code (5xx)

Reason


400

Invalid request means it does not follow the rules


501

Server does not support the functionality required by the request

401

Invalid authentication credentials

505

HTTP version not supported

403

Refuse to authorize the request

507

Insufficient storage

404

Requested page does not exist

508

Infinite loop detected

Note: The list of status codes above is not exhaustive. The general guideline is that the error codes with 4xx are those where the client needs to fix something, while 5xx errors are those where the service needs to act to resolve the issue.

Summary #

Many things can go wrong in the request-response cycle of an API. Due to the nature of a distributed system, clients might not always be able to find the root cause of such issues. Often, a request retry is considered a remedy for all these problems. However, the improper use of retries can hurt the clients, the network, or the service in different ways. We discussed some techniques (request cap and backoff) to utilize the retry facility efficiently. Idempotency is a way to tackle the side effects of retries (such as potential duplication of work and data) that we’ll see in a different lesson.

Circuit Breaker Pattern

Caching at Different Layers